Semi-extractive Multi-document Summarization

نویسندگان

  • FATEMEH GHIYAFEH DAVOODI
  • Yllias Chali
چکیده

In this thesis, I design a Maximum Coverage problem with KnaPsack constraint (MCKP) based model for extractive multi-document summarization. The model integrates three measures to detect important sentences including Coverage, rewards sentences in regards to their representative level of the whole document, Relevance, focuses to select sentences that related to the given query, and Compression, rewards concise sentences. To generate a summary, I apply an efficient and scalable greedy algorithm. The algorithm has a near optimal solution when its scoring functions are monotone non-decreasing and submodular. I use DUC 2007 dataset to evaluate our proposed method. Investigating the results using ROUGE package shows improvement over two closely related works. The experimental results illustrates that integrating compression in the MCKP-based model, applying semantic similarity measures to detect Relevance measure and also defining all scoring functions as a monotone submodular function result in having a better performance in generating a summary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using N-Grams To Understand the Nature of Summaries

Although single-document summarization is a well-studied task, the nature of multidocument summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of humanwritten multi-document summaries have not been quantified. In this paper, we empirically character...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

SIMBA: An Extractive Multi-document Summarization System for Portuguese

This is a proposal for demonstration of simba in PROPOR 2012. simba is an extractive multi-document summarization system that aims at producing generic summaries guided by a compression rate defined by the user. It uses a double-clustering approach to find the relevant information in a set of texts. In addition, simba uses a sentence simplification procedure as a mean to ensure summary compress...

متن کامل

A Computationally Efficient System for High-Performance Multi-Document Summarization

We propose and develop a simple and efficient algorithm for generating extractive multi-document summaries and show that this algorithm exhibits stateof-the-art or near state-of-the-art performance on two Document Understanding Conference datasets and two Text Analysis Conference datasets. Our results show that algorithms using simple features and computationally efficient methods are competiti...

متن کامل

A Hybrid Approach to Multi-document Summarization of Opinions in Reviews

We present a hybrid method to generate summaries of product and services reviews by combining natural language generation and salient sentence selection techniques. Our system, STARLET-H, receives as input textual reviews with associated rated topics, and produces as output a natural language document summarizing the opinions expressed in the reviews. STARLET-H operates as a hybrid abstractive/...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015